Active Reinforcement Learning: Observing Rewards at a Cost

نویسندگان

David Krueger

Jan Leike

چکیده

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient exploration through active learning for value function approximation in reinforcement learning

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularl...

متن کامل

Active Learning of MDP Models

We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a utility maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible ...

متن کامل

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

متن کامل

Deep Reinforcement Fuzzing

Fuzzing is the process of finding security vulnerabilities in input-processing code by repeatedly testing the code with modified inputs. In this paper, we formalize fuzzing as a reinforcement learning problem using the concept of Markov decision processes. This in turn allows us to apply state-of-theart deep Q-learning algorithms that optimize rewards, which we define from runtime properties of...

متن کامل

Reinforcement learning for a vision based mobile robot

Reinforcement learning systems improve behaviour based on scalar rewards from a critic. In this work vision based behaviours, servoing and Wandering, are learned through a Q-learning method which handles continuous states and actions. There is no requirement for camera calibration, an actuator model, or a knowledgeable teacher. Learning thmugh observing the actions of other behaviours improves ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Active Reinforcement Learning: Observing Rewards at a Cost

نویسندگان

چکیده

منابع مشابه

Efficient exploration through active learning for value function approximation in reinforcement learning

Active Learning of MDP Models

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Deep Reinforcement Fuzzing

Reinforcement learning for a vision based mobile robot

عنوان ژورنال:

اشتراک گذاری